面向目标的意见单词提取(TOWE)是一项精细的情感分析任务,旨在从句子中提取给定意见目标的相应意见单词。最近,深度学习方法在这项任务上取得了显着进步。然而,由于昂贵的数据注释过程,TOWE任务仍然遭受培训数据的稀缺性。有限的标记数据增加了测试数据和培训数据之间分配变化的风险。在本文中,我们建议利用大量未标记的数据来通过增加模型对变化分布变化的暴露来降低风险。具体而言,我们提出了一种新型的多透明一致性正则化(MGCR)方法,以利用未标记的数据并设计两个专门用于TOWE的过滤器,以在不同的粒度上过滤嘈杂的数据。四个TOWE基准数据集的广泛实验结果表明,与当前的最新方法相比,MGCR的优越性。深入分析还证明了不同粒度过滤器的有效性。我们的代码可在https://github.com/towessl/towessl上找到。
translated by 谷歌翻译
半监督学习(SSL)通过利用大量未标记数据来增强有限标记的样品来改善模型的概括。但是,目前,流行的SSL评估协议通常受到计算机视觉(CV)任务的约束。此外,以前的工作通常从头开始训练深层神经网络,这是耗时且环境不友好的。为了解决上述问题,我们通过从简历,自然语言处理(NLP)和音频处理(AUDIO)中选择15种不同,具有挑战性和全面的任务来构建统一的SSL基准(USB),我们会系统地评估主导的SSL方法,以及开源的一个模块化和可扩展的代码库,以对这些SSL方法进行公平评估。我们进一步为简历任务提供了最新的神经模型的预训练版本,以使成本负担得起,以进行进一步调整。 USB启用对来自多个域的更多任务的单个SSL算法的评估,但成本较低。具体而言,在单个NVIDIA V100上,仅需要37个GPU天才能在USB中评估15个任务的FIXMATCH,而335 GPU天(除ImageNet以外的4个CV数据集中的279 GPU天)在使用典型协议的5个CV任务上需要进行5个CV任务。
translated by 谷歌翻译
由于训练和测试分布之间的不匹配,自动语音识别(ASR)的跨域性能可能会受到严重阻碍。由于目标域通常缺乏标记的数据,并且在声学和语言水平上存在域移位,因此对ASR进行无监督的域适应性(UDA)是一项挑战。先前的工作表明,通过利用未标记的数据的自我检查,自我监督的学习(SSL)或伪标记(PL)可以有效地进行UDA。但是,这些自我介绍也面临不匹配的域分布中的性能退化,而以前的工作未能解决。这项工作提出了一个系统的UDA框架,可以在预训练和微调范式中充分利用具有自学贴标签的未标记数据。一方面,我们应用持续的预训练和数据重播技术来减轻SSL预训练模型的域不匹配。另一方面,我们提出了一种基于PL技术的域自适应微调方法,并具有三种独特的修改:首先,我们设计了一种双分支PL方法,以降低对错误的伪标签的敏感性;其次,我们设计了一种不确定性感知的置信度过滤策略,以提高伪标签的正确性。第三,我们引入了两步PL方法,以结合目标域语言知识,从而产生更准确的目标域伪标记。各种跨域场景的实验结果表明,所提出的方法可以有效地提高跨域的性能,并显着超过以前的方法。
translated by 谷歌翻译
视觉识别任务中的长尾类分布对于如何处理头部和尾部类之间的偏置预测,即,模型倾向于将尾部类作为头部类进行分类。虽然现有的研究专注于数据重采采样和损失函数工程,但在本文中,我们采取了不同的视角:分类利润率。我们研究边距和注册之间的关系(分类得分)并经验遵守偏置边缘,并且偏置的Logits是正相关的。我们提出MARC,一个简单但有效的边缘校准函数,用于动态校准偏置边缘的偏置利润。我们通过对普通的长尾基准测试进行了广泛的实验,包括CIFAR-LT,Imagenet-LT,LT,以及不适物 - LT的广泛实验。实验结果表明,我们的MARC在这些基准上实现了有利的结果。此外,Marc只需三行代码即可实现。我们希望这种简单的方法能够激励人们重新思考偏置的边距和偏见的长尾视觉识别标识。
translated by 谷歌翻译
交叉语言语音适应旨在解决利用多种丰富资源语言来构建低资源目标语言的模型的问题。由于低资源语言具有有限的培训数据,语音识别模型可以容易地过度装备。在本文中,我们建议使用适配器来研究多种适配器的性能,用于参数有效的交叉语音语音适应。基于我们以前的MetaAdapter,隐含地利用适配器,我们提出了一种名为SimAdapter的新算法,用于从Adapters明确学习知识。我们的算法利用了可以轻松集成到变压器结构中的适配器.METAADAPTER利用元学习将一般知识从训练数据转移到测试语言。 SimAdapter旨在使用适配器微调期间了解源语言与目标语言之间的相似性。我们在公共语音数据集中对五种低资源语言进行广泛的实验。结果表明,与强大的全型微调基线相比,我们的MetaAdapter和SimAdapter方法可以将WER减小2.98%和2.55%,只有2.5%和15.5%的培训参数。此外,我们还表明这两种新型算法可以集成,以便更好的性能,相对减少高达3.55%。
translated by 谷歌翻译
In the field of cross-modal retrieval, single encoder models tend to perform better than dual encoder models, but they suffer from high latency and low throughput. In this paper, we present a dual encoder model called BagFormer that utilizes a cross modal interaction mechanism to improve recall performance without sacrificing latency and throughput. BagFormer achieves this through the use of bag-wise interactions, which allow for the transformation of text to a more appropriate granularity and the incorporation of entity knowledge into the model. Our experiments demonstrate that BagFormer is able to achieve results comparable to state-of-the-art single encoder models in cross-modal retrieval tasks, while also offering efficient training and inference with 20.72 times lower latency and 25.74 times higher throughput.
translated by 谷歌翻译
The past few years have witnessed the prevalence of self-supervised representation learning within the language and 2D vision communities. However, such advancements have not been fully migrated to the community of 3D point cloud learning. Different from previous pre-training pipelines for 3D point clouds that generally fall into the scope of either generative modeling or contrastive learning, in this paper, we investigate a translative pre-training paradigm, namely PointVST, driven by a novel self-supervised pretext task of cross-modal translation from an input 3D object point cloud to its diverse forms of 2D rendered images (e.g., silhouette, depth, contour). Specifically, we begin with deducing view-conditioned point-wise embeddings via the insertion of the viewpoint indicator, and then adaptively aggregate a view-specific global codeword, which is further fed into the subsequent 2D convolutional translation heads for image generation. We conduct extensive experiments on common task scenarios of 3D shape analysis, where our PointVST shows consistent and prominent performance superiority over current state-of-the-art methods under diverse evaluation protocols. Our code will be made publicly available.
translated by 谷歌翻译
This paper utilizes an anomaly detection algorithm to check if underwater gliders are operating normally in the unknown ocean environment. Glider pilots can be warned of the detected glider anomaly in real time, thus taking over the glider appropriately and avoiding further damage to the glider. The adopted algorithm is validated by two valuable sets of data in real glider deployments, the University of South Florida (USF) glider Stella and the Skidaway Institute of Oceanography (SkIO) glider Angus.
translated by 谷歌翻译
Blind watermarking provides powerful evidence for copyright protection, image authentication, and tampering identification. However, it remains a challenge to design a watermarking model with high imperceptibility and robustness against strong noise attacks. To resolve this issue, we present a framework Combining the Invertible and Non-invertible (CIN) mechanisms. The CIN is composed of the invertible part to achieve high imperceptibility and the non-invertible part to strengthen the robustness against strong noise attacks. For the invertible part, we develop a diffusion and extraction module (DEM) and a fusion and split module (FSM) to embed and extract watermarks symmetrically in an invertible way. For the non-invertible part, we introduce a non-invertible attention-based module (NIAM) and the noise-specific selection module (NSM) to solve the asymmetric extraction under a strong noise attack. Extensive experiments demonstrate that our framework outperforms the current state-of-the-art methods of imperceptibility and robustness significantly. Our framework can achieve an average of 99.99% accuracy and 67.66 dB PSNR under noise-free conditions, while 96.64% and 39.28 dB combined strong noise attacks. The code will be available in https://github.com/rmpku/CIN.
translated by 谷歌翻译
Our situated environment is full of uncertainty and highly dynamic, thus hindering the widespread adoption of machine-led Intelligent Decision-Making (IDM) in real world scenarios. This means IDM should have the capability of continuously learning new skills and efficiently generalizing across wider applications. IDM benefits from any new approaches and theoretical breakthroughs that exhibit Artificial General Intelligence (AGI) breaking the barriers between tasks and applications. Recent research has well-examined neural architecture, Transformer, as a backbone foundation model and its generalization to various tasks, including computer vision, natural language processing, and reinforcement learning. We therefore argue that a foundation decision model (FDM) can be established by formulating various decision-making tasks as a sequence decoding task using the Transformer architecture; this would be a promising solution to advance the applications of IDM in more complex real world tasks. In this paper, we elaborate on how a foundation decision model improves the efficiency and generalization of IDM. We also discuss potential applications of a FDM in multi-agent game AI, production scheduling, and robotics tasks. Finally, through a case study, we demonstrate our realization of the FDM, DigitalBrain (DB1) with 1.2 billion parameters, which achieves human-level performance over 453 tasks, including text generation, images caption, video games playing, robotic control, and traveling salesman problems. As a foundation decision model, DB1 would be a baby step towards more autonomous and efficient real world IDM applications.
translated by 谷歌翻译